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Abstract. For a sequence of dynamic optimization problems, we aim at dis- 
cussing a notion of consistency over time. This notion can be informally intro- 
— vi , duccd as follows. At the very first time step to, the decision maker formulates 

■ an optimization problem that yields optimal decision rules for all the forth- 

^^' coming time step to , ti , . . . , T; at the next time step ti , he is able to formulate 

Cy ' a new optimization problem starting at time ti that yields a new sequence 

of optimal decision rules. This process can be continued until final time T is 
reached. A family of optimization problems formulated in this way is said to 
CD ' be time consistent if the optimal strategies obtained when solving the original 

^Nj ' problem remain optimal for all subsequent problems. The notion of time con- 

sistency, well-known in the field of Economics, has been recently introduced 
in the context of risk measures, notably by Artzner et al. (2007) and studied 
V J ' in the Stochastic Programming framework by Shapiro (2009) and for Markov 

Decision Processes (MDP) by Ruszczynski (2009). We here link this notion 
with the concept of "state variable" in MDP, and show that a significant class 
r^ ■ of dynamic optimization problems are dynamically consistent, provided that 

" ^~^ ' an adequate state variable is chosen. 



1. Introduction 

> 

ly-s ' Stochastic Optimal Control (SOC) is concerned with sequential decision-making 

f^ , under uncertainty. Consider a dynamical process that can be influenced by exoge- 

^O ■ nous noises as well as decisions one has to make at every time step. The deci- 

sion maker wants to optimize the behavior of the dynamical system (for instance, 
lO ' minimize a production cost) over a certain time horizon. As the system evolves, 

^^ , observations of the system are made; we here suppose that the decision maker is 

able to keep in memory all the past observations. Naturally, it is generally more 
profitable for him to adapt its decisions to the observations he makes of the system. 
He is hence looking for strategies rather than simple decisions. In other words, 
he is looking for applications that map every possible history of the observations 
^ • to corresponding decisions. Because the number of time steps may be large, the 

5^ I representation of such an object is in general numerically intractable. 

However, an amount of information lighter than the whole history of the system 
is often sufficient to make an optimal decision. In the seminal work of Bellman 
(1957), the minimal information on the system that is necessary to make the optimal 
decision plays a crucial role; it is called the state variable (see Whittle, 1982, for 
a more formal definition). Moreover, the Dynamic Programming (DP) principle 
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provides a way to compute the optimal strategies when the state space dimension 
is not too large (see Bertsekas, 2000, for a broad overview on DP). The aim of this 
paper is to establish a link between the concept of state variable and the notion of 
time consistency^. 

The notion of dynamic consistency is well-known in the field of economics (see 
Hammond, 1989) and has been introduced in the context of risk measures (see 
Artzner et al., 2007; Riedel, 2004; Detlefsen and Scandolo, 2005; Cheridito et al., 
2006, for definitions and properties of coherent and consistent dynamic risk mea- 
sures). Dynamic consistency has then been studied in the stochastic progranmring 
framework by Shapiro (2009) and for Markov Decision Processes by Ruszczynski 
(2009). In this paper, we rather use the (almost equivalent) definition of time con- 
sistency given by Ekeland and Lazrak (2006), which is more intuitive and seems 
better suited in the framework of optimal control problems. In this context, the 
property of time consistency is loosely stated as follows. The decision maker formu- 
lates an optimization problem at time to that yields a sequence of optimal decision 
rules for Iq and for the following time steps ti, . . . ,tjq = T . Then, at the next time 
step ii, he formulates a new problem starting at ti that yields a new sequence of 
optimal decision rules from time steps ti to T . Suppose the process continues until 
time T is reached. The sequence of optimization problems is said to be dynamically 
consistent if the optimal strategies obtained when solving the original problem at 
time to remain optimal for all subsequent problems. In other words, time consis- 
tency means that strategies obtained by solving the problem at the very first stage 
do not have to be questioned later on. 

The notion of information here plays a crucial role. Indeed, we show in this paper 
that a sequence of problems may be consistent for some information structure while 
inconsistent for a different one. Consider for example a standard stochastic opti- 
mization problem solvable using DP. We will observe that the sequence of problems 
formulated after the original one at the later time steps are time consistent. Add 
now a probabilistic constraint involving the state at the final time T. Wc will show 
that such a constraint brings time inconsistency in the sense that optimal strategics 
based on the usual state variable have to be reconsidered at each time step. This 
is because, roughly speaking, a probabilistic constraint involves not only the state 
variable values but their probabilistic distributions. Hence the only knowledge of 
the usual state variable of the system is insufficient to write consistent problems at 
subsequent time steps. So, in addition to the usual technical difficulties regarding 
probabilistic constraints (mainly related to the non-convexity of the feasible set of 
strategies), an additional problem arises in the dynamic case. We will see that, in 
fact, this new matter comes from the information on which the optimal decision 
is based. Therefore, with a well-suited state variable, the sequence of problems 
regains dynamic consistency. 

In §2, we carefully examine the notion of time consistency in the context of a 
deterministic optimal control problem. The main ideas of the paper are so explained 
and then extended, in §3, to a sequence of SOC problems. Next, in §4, we show 
that simply adding a probability constraint (or, equivalently in our context, an 
expectation constraint) to the problem makes time consistency fall apart, when 
using the original state variable. We then establish that time consistency can be 
recovered provided an adequate state variable is chosen. We conclude that, for a 
broad class of SOC problems, time consistency has to be considered with respect 
to the notion of a state variable and of DP. 



We either use the term "dynamically consistent" or "time consistent" to refer to the same 
notion. 
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2. A FIRST EXAMPLE 

We introduce sequential deterministic optimal control problems, indexed by time, 
and derive the notion of time consistency on this instance. We then illustrate the 
fact that the decision making process may be time consistent or not, depending on 
the information on which decisions are based. The discussion is informal, in the 
sense that we do not enter technical details regarding existence of the solutions for 
the problems we introduce. 

Let us consider a discrete and finite time horizon tg,.. .,1^ — T.^ The deci- 
sion maker has to optimize (according to a cost function we introduce below) the 
management of an amount of stock xt , which lies in some space Xt , at every time 
step t — to, ■ ■ ■ ,T. Let Ut be some other space, for every time step t~tQ,...,T—l. 
At each time step t, a decision ut € Ut has to be made. Then a cost Lt is incurred by 
the system, depending on the values of the control and on the auxiliary variable Xt 
that we call the state of the system. This state variable is driven from time t to 
time t + 1 by some dynamics ft'-XtxUt^ <Yt+i. The aim of the decision maker 
is to minimize the sum of the intermediate costs Lt at all time steps plus a final 
cost K. 

The problem hence reads: 

T-l 

(la) min S^ Lt {xt, ut) + K (xt) , 

x.u ^ — ^ 

subject to the initial condition: 
(lb) xto given, 

and dynamic constraints: 

(Ic) xt+i ^ ft{xt,ut) , yt = to,...,T-l. 

Note that here the decision at time t is taken knowing the current time step and 
the initial condition (the decision is generally termed "open loop"). A priori, there 
is no need for more information since the model is deterministic. 

Suppose a solution to this problem exists. This is a sequence of controls that we 
denote by u? j^^, . . . , m^ x-ij where the first index refers to the initial time step and 
the second index refers to the time step for which the decision applies. Moreover, 
we suppose a solution exists for each one of the natural subsequent problems, i.e. 
for every t^ = ii, . . . , T — 1: 

T-l 

(2a) min S^ Lt{xt,ut) + K (xt) , 



X'U 



t=ti 



(2b) s.t. xti given, 

(2c) xt+i = ftixt,ut), Vi = ti,. ..,T- 1. 

We denote the solutions of these problems by u^. t,- ■ ■ ,ul. rp_i, for every time 
step ii = ii, . . . , T — 1. Those notations however make implicit the fact that the 
solutions do generally depend on the initial condition I't . . We now make a first 
observation. 

Lemma 1 (Independence of the initial condition). In the very particular case 
when the solution to Problem (1) and the solutions to Problems (2) for every time 
step ti — ti, . . . ,T — 1 do not depend on the initial state conditions, problems are 
dynamically consistent. 



where ii + 1 



li+l 
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Proof. Let us denote by xJ j. the optimal value of the state variable within Prob- 
lem (1) at time U. If we suppose that solutions to Problems (2) do not depend 
on the initial condition, then they are the same as the solutions obtained with the 
initial condition x^^^., namely u^^^^., . . . ,ul^ t-i- ^^ other words, the sequence of 
decisions ul^^to^ ■ • ■ ; "to t-i remains optimal for the subsequent problems starting 
at a later date. D 

This property is of course not true in general, but we see in Example 1 hereafter 
and in §3 that some very practical problems do have this surprising property. 

Example 1. Let us introduce, for every t = Iq, . . . ,T — 1, functions It '■ tit ^ ^ 
and ft'-Ut^ R, and assume that Xt is scalar. Let -RT be a scalar constant and 
consider the following deterministic optimal control problem: 

min y^ /( {ut) Xt + Kxt, 

x,u ^ — ^ 

t=to 

s.t. xtf, given, 

xt+i ^ ftiut)xt, yt = tQ,...,T -1. 

Variables xt can be recursively replaced using dynamics ft- Therefore, the above 
optimization problem can be written: 

T-l 

min V" It (ut) ft-i {ut-i) ■ ■ ■ fto (uto) Xto + KJt-i (wt-i) ■ ■ ■ /to ("to) a^to- 

u ^ — ^ 
t = to 

Hence the optimal cost of the problem is linear with respect to the initial condi- 
tion xtg ■ Suppose that a;to only takes positive values. Then the value of ccto has 
no influence on the minimizer (it only influences the optimal cost). The same ar- 
gument applies at subsequent time steps ti > to provided that dynamics are such 
that Xt remains positive for every time step i = ti, . . . , T. Now, formulate the same 
problem at a later date U = ii, . . . , T — 1, with initial condition Xt^ given. By the 
same token as for the first stage problem, the value of the initial condition xt^ has 
no influence on the optimal controls. Assumptions made in Lemma 1 are fulfilled, 
so that the time consistency property holds true for open-loop decisions without 
reference to initial state conditions. 

Although, for the time being, this example may look very special, we will see 
later on that it is analogous to familiar SOC problems. 

As already noticed, Lennna 1 is not true in general. Moreover, the deterministic 
formulation (1) comes in general from the representation of a real- life process which 
may indeed be subject to unmodelized disturbances. Think of an industrial context, 
for example, in which sequential decisions are taken in the following manner. 

• At time to, Problem (1) is solved. One obtains a decision u^^^ ^^ to apply at 
time to, as well as decisions Ufo^tii • ■ • > "to t-i ^'^^ future time steps. 

• At time ii, one formulates and solves the problem starting at time ii with 
initial condition xt^ = /to (^to' "to, to) "'" ^*i' ^*i being some perturbation of 
the model. There is no reason not to use the observation of the actual value 
of the variable Xt^ at time ii as long as we have it at our disposal. 

• Hence a decision m^^ j^ is obtained, which is different from the initially 
obtained optimal decision u^. ^ (once again, in general). 

• The same process continues at times t2, ■ ■ ■ ,T — 1. 

Let us now state the two following lemmas. 
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Lemma 2 (True deterministic world). If the deterministic model is actually exact, 
i.e. if all perturbations £4. introduced above equal zero, then Problems (2) with 
initial conditions Xt- — Xj. = ft-{Xf._^,u^^ ti_i) '^''^ dynamically consistent. 

Proof. Since decisions u^ ^ , . . . ,u^ rp_^ are optimal for Problem (1), it follows that 
decisions ut ^^ , . . . , u^ j,_-^ are optimal for the problem: 

T-l 

min Lt (xto,^^, jj + ^Lt{xt,ut) + K (xt) , 

s.t. xt, = ftAxto,u;^,to)^ 

xt+i = ft{xt,ut) , yt = ti,...,T -1, 

which has the same arg min as Problem (2) at time ti. The same argument applies 
recursively for subsequent time steps. D 

It is clear that Lemma 2 is not satisfied in real life. Therefore, adding distur- 
bances to the problem seems to bring inconsistency to the sequence of optimization 
problems. Decisions that are optimal for the first stage problem do not remain 
optimal for the subsequent problems if we do not let decisions depend on the initial 
conditions. 

In fact, as it is stated next, time consistency is recovered provided we let decisions 
depend upon the right information. 

Lemma 3 (Right amount of information) . Suppose that one is looking for strate- 
gies {^to to' ■ ■ ■ ' '^t T-i) '^^ feedback functions depending on the variable x. Then 
Problems (2) are time consistent for every time step t = to, . . . ,T — 1. 

Proof. The result is a direct application of the DP principle, which states that there 
exists such a feedback function $4^^ j. that is optimal for Problem (1) and is still 
optimal for Problem (2) at time ti, whatever initial condition xt- is. D 

We thus retrieve the dynamic consistency property provided that we use the 
feedback functions <f>j ^ rather than the controls u^ ^. In other words, problems 
arc dynamically consistent as soon as the control strategy is based on a sufficiently 
rich amount of information (time instant t and state variable a; in the deterministic 
case). 

There is of course an obvious link between these optimal strategies and the 
controls (wto^t^, . . . , "to^T-i)' namely: 



K.tK.t), Vi = io,...,r-i, 



-'to,* — ^to,t \-^to,tJ 1 



where 



Xto J 



X 



to,t+l 



ft{xi,uK,tK,t)), yt = to,...,T-i. 



The considerations we made so far seem to be somewhat trivial However, we 
whall observe that for SOC problems, which may seem more complicated at first 
sight, the same considerations remain true. Most of the time, decision making 
processes are time consistent, provided we choose the correct information on which 
decisions are based. 
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3. Stochastic optimal control without constraints 

We now consider a more general case in which a controlled dynamical system 
is influenced by modeled exogenous disturbances. The decision maker has to find 
strategies to drive the system so as to minimize some objective function over a 
certain time horizon. This is a sequential decision making process on which we 
can state the question of dynamic consistency. As in the previous example, the 
family of optimization problems is derived from the original one by truncating the 
dynamics and the cost function (the final time step T remains unchanged in each 
problem), and strategies are defined relying on the same information structure as 
in the original problem. In the sequel, random variables will be denoted using bold 
letters. 

3.1. The classical case. Consider a dynamical system characterized by state^ 
variables X = {Xt)t=to,...,T, where Xt takes values in Xf The system can be 
influenced by control variables U = {Ut)t=ta,....T-i and by exogenous noise vari- 
ables W = iWt)t=to....,T {Ut and Wt taking values in Ut and Wt respectively). 
All random variables are defined on a probability space (fi,^, P). The problem we 
consider consists in minimizing the expectation of a sum of costs depending on the 
state, the control and the noise variables over a discrete finite time horizon. The 
state variable evolves with respect to some dynamics that depend on the current 
state, noise and control values. The problem starting at to writes: 



(3a) 



min E[^Lt(X,,[/t,Wt+i) + /i(XT)J 



(3b) 


s.t. 


Xt„ given, 






(3c) 




Xt+i=/t(Xt,[7t,Wt+i), 


yt = to, 


...,T- 1 


(3d) 




Ut<Xt^,Wt,.....Wu 


yt^to,... 


,r-i. 



Noises that affect the system can be correlated through time. A general approach 
in optimal control consists in including all necessary information in the variable X 
so that variables Wt^ , ■ ■ ■ , Wt are independent through time. At most, one has to 
include all the past values of the noise variable within the variable X. We hence 
make the following assumption. 

Assumption 1 (Markovian setting). Noises variables Xto, Wtn ■ ■ ■ , Wt are inde- 
pendent. 

Using Assumption 1, it is well known (see Bertsekas, 2000) that: 

• there is no loss of optimality in looking for the optimal strategy Ut at 
time i as a feedback function depending on the state variable Xt, i.e. as a 
(measurable) function of the form ^to,t '■ Xt ^Ut; 

• the optimal strategies $4^^ (^, . . . , $j rp_^ can be obtained by solving the 
classical DP equation. Let Vt{x) denote the optimal cost when being at 
time step t with state value x, this equation reads: 

Vt{x) = K{x), 

Vt{x) = min¥.(Lt{x,u,Wt+i) + Vt+i{ft{x,u,Wt+i) 



The use of the terminology "state" is somewhat abusive until we make Assumption 1. 
We here use the notations ~ for "is distributed according to" and -< for "is measurable with 
respect to". 
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We call this case the classical case. It is clear while inspecting the DP equation 
that optimal strategies ^^^ f^, . . . , $j^ y_2 remain optimal for the subsequent op- 
timization problems: 

(4a) min E 1^2 Lt {Xt,Ut,Wt+i) + K {Xt)] , 

(4b) s.t. Xti given, 

(4c) Xt+i^ft{Xt,Ut,Wt+i), Vi = i,;,...,T-l, 

(4d) Ut^Xt^,Wu^,,...,Wt, yt = u,...,T-i, 

for every ij = ti, . . . ,T — 1. In other words, these problems arc dynamically con- 
sistent provided the information variable at time t contains at least the state vari- 
able Xf While building an analogy with properties described in the deterministic 
example in §2, the reader should be aware that the case we consider here is closer 
to Lemma 1 than to Lemma 3, as we explain now in more details. 

3.2. The distributed formulation. Another consequence of the previous DP 
equation for Problem (3) is that the optimal feedback functions do not depend 
on the initial condition Xtg. The probability law of Xt„ only affects the optimal 
cost value, but not its arg min. In fact, we are within the same framework as 
in Example 1. Indeed, Problem (3) can be written as a deterministic distributed 
optimal control problem involving the probability laws of the state variable, the 
dynamics of which are given by the so-called Fokker-Planck equation. Let us detail 
this last formulation (see Witsenhausen, 1973). 

Let ^t be the space of M- valued functions on Xf Denoting fito the probability 
law of the first stage state Xtg, and given feedback laws <I>t : A't — !• Ut for every 
time step t = to, ■ ■ ■ ,T—1, we define the operator A^* : '^t+i ^ "^t, which is meant 
to integrate cost functions backwards in time, as^: 

(Af>,+i) (.) ^ E (7^,+i o /, (., $t (■) , Wt+ij) . 

Given a feedback function $t and a cost function V't+i £ ^t+ii for every x (£ Xt 
the value {A^ ''ipt+i){x) is the expected value of ipt+iiX t+i) , knowing that Xt = x 
and that feedback $t is used. Thanks to a duality argument, the Fokker-Planck 
equation, which describes the evolution of the state probability law (as driven by 
the chosen feedback laws $t), is obtained: 



Mt+i = (^f'j Mt, 



with {A^ * Y being the adjoint operator of ^j ' . Next we introduce the operator 



Af * : Xt 



Af'(-)"E(L,(,$,(.),TV*+i)), 



which is meant to be the expected cost at time t for each possible state value when 
feedback function <^t is applied. Let us define, for every ipt G ^t and every proba- 
bility law fit on Xt, {ipt, y^t) as E {il\{Xt)) when Xt is distributed according to /x^. 
We can now write a deterministic infinite-dimensional optimal control problem that 



We do not aim at discussing technical details concerning integrability here. We suppose that 
operators we introduce arc well-defined. 
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is equivalent to Problem (3): 



T-l 

min ^(Af',/it) + (i^,/iT), 

'^ t=to 

s.t. nt„ given, 

tit+i^(Ap)\u yt = to,...,T-l. 



mm 



Remark 1. An alternative formulation is: 

s.t. Vt = K, 

^t = ^f'Vt+i+Af% Vi = T-l,...,io. 

This may be called "the backward formulation" since the "state" V't(') follows 
an affine dynamics which is backward in time, with an initial-only cost function 
(whereas the previous forward formulation follows a forward linear dynamics with 
an integral + final cost function). Both formulations are infinite-dimensional linear 
programming problems which are dual of each other. The functions ^(■) and '0(-) 
are the distributed state and/or co-state (according to which one is considered the 
primal problem) of this distributed deterministic optimal control problem of which 
<i> is the distributed control. 

Probability laws fit sue by definition positive and appear only in a multiplicative 
manner in the problem. Hence we are in a similar case as Example 1. The main 
difference is rather technical: since we here have probability laws instead of scalars, 
we need to apply backwards in time interversion theorems between expectation and 
minimization in order to prove that the solution of the problem actually does not 
depend on the initial condition fitg. Indeed, suppose that (j-t-i is given at time 
step T — 1. Then the most inner optimization problem reads: 

min (A( ^"',^T-^i) + (A',^t) , 

S.t. flT = Mt-i') MT-1, 



which is equivalent to: 



mm 

*T-1 



Af--+4l-i^,/.T--i 



The point is that operators Aj ^"^ + Aj^_\^ K and fir-i both take values in Xt-i 
and that the minimization has to be done "x by x", so that we are in the case of 
Example 1 for every x. Therefore, the minimizer does not depend on fir-i- For a 
rigorous proof, one needs several technical assumptions concerning measurability, 
which we do not intend to discuss in this paper (see Rockafellar and Wets, 1998, 
Theorem 14.60). The same argument applies recursively to every time step be- 
fore r — 1 so that, at time to, the initial condition fita only influences the optimal 
cost of the problem, but not the argument of the minimum itself (here, the feedback 
laws <!>l^t). 

Hence, following Lemma 1, Problems (4) are naturally time consistent when 
strategies are searched as feedback functions on Xt only. It thus appears that the 
rather general class of stochastic optimal control problems shaped as Problem (3) 
is in fact very specific. However, such a property does not remain true when adding 
new ingredients in the problem, as we show in the next subsection. 
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4. Stochastic optimal control with constraints 

We now give an example in which the state variable, as defined notably by Whittle 
(1982), cannot be reduced to variable Xt as above. Let us make Problem (3) more 
complex by adding to the model a probability constraint applying to the final time 
step T. For instance, we want the system to be in a certain state at the final time 
step with a given probability: 

P(/i(Xt) >b) <Tr. 

Such chance constraints can equivalently be modelled as an expectation constraint 
in the following way: 

IE (l{h(XT)>fc}) ^ '^^ 

where 1a refers to the indicator function of set A. Note however that chance con- 
straints bring important theoretical and numerical difficulties, notably regarding 
connexity and convexity of the feasible set of controls, even in the static case. 
The interested reader should refer to the work of Prekopa (1995), and to the 
handbook by Ruszczynski and Shapiro (2003, Ch.5) for mathematical properties 
and numerical algorithms in Probabilistic Programming (see also Henrion, 2002; 
Henrion and Strugarek, 2008, for related studies). We do not discuss them here. 
The difficulty we are interested in is common to both chance and expectation con- 
straints. This is why we concentrate in the sequel on adding an expectation con- 
straint to Problem (3) of the form: 

E(5(XT))<a. 

The reader familiar with chance constraints might want to see the level a as a level 
of probability that one wants to satisfy for a certain event at the final time step. 

We now show that when adding such an expectation constraint, the dynamic 
consistency property falls apart. More precisely, the sequence of SOC problems 
are not time consistent anymore when using the usual state variable. Nevertheless, 
we observe that the lack of consistency comes from an inappropriate choice for the 
state variable. By choosing the appropriate state variable, one regains dynamic 
consistency. 

4.1. Problem setting. We now go back to the constrained formulation and intro- 
duce a measurable function g : Xt — ^ M and a S M. We consider Problem (3) with 
the additional final expectation constraint: 

¥.{g{XT))<a. 

The subsequent optimization problems formulated at an initial time ti > to are 
naturally deduced from this problem. The level a of the expectation constraint 
remains the same for every problem. One has to be aware that this corresponds to 
a (naive) modelling choice for the family of optimization problems under consid- 
eration. Such a choice is questionable since the perception of the constraint may 
evolve over time. 

Suppose there exists a solution for the problem at to. As previously, we are 
looking for the optimal control at time i as a feedback function $(^ ^ depending on 
the variable Xt- The first index to refers to the time step at which the problem 
is stated, while the second index t refers to the time step at which the decision is 
taken. 

One has to be aware that these solutions now implicitly depend on the initial 
condition Xtg- Indeed, let /zt be the probability law of Xt- Constraint (4) can 
be written {g, jit) < a, so that the equivalent distributed formulation of the initial 
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time problem is: 

T-l 

min ^(^Af',fif^ + (K,^j.t), 

''^ t=to 

subject to the Fokker-Planck dynamics: 

fit+i = [Ap)\u yt = to,...,T-l, 
fito being given by the initial condition, and the final expectation constraint: 

(5,Mt) < a. 

Even though this problem seems linear with respect to variables nt, the last con- 
straint introduces an additional highly nonlinear term in the cost function, namely: 

^{(ff>MT><a}j 

where XA stands for the characteristic function^ of set A. The dynamics are still 
linear and variables /it are still positive, but the objective function is not linear with 
respect to fiT anymore, and therefore not linear with respect to the initial law iitg 
either. Hence there is no reason for feedback laws to be independent of the initial 
condition as in the case without constraint presented in §3. 

Let us now make a remark on this initial condition. Since the information struc- 
ture is such that the state variable is fully observed, the initial condition is in fact 
of a deterministic nature: 

Xto = Xto, 

where xt^ is a given (observed) value of the system state. The probability law 
of Xta is accordingly the Dirac function S^^ .^ The reasoning made for the problem 
initiated at time to remains true for the subsequent problems starting at time U: 
an observation xt- of the state variable JCt . becomes available before solving Prob- 
lem (4), so that its natural initial condition is in fact: 

Xt^ = Xt,- 

Otherwise stated, the initial state probability law in each optimization problem we 
consider should correspond to a Dirac function. Note that such a sequence of Dirac 
functions is not driven by the Fokker-Planck equation, but is in fact associated to 
some dynamics of the degenerate filter corresponding to this perfect observation 
scheme. In the sequel, we assume such an initial condition for every problem we 
consider. 

Now, according to Lemma 2, the subsequent optimization problems formulated 
at time ti will be dynamically consistent provided their initial conditions are given 
by the optimal Fokker-Planck equation: 

t^to,U - [^t.-i j ■■■\^ta J Mto- 

However, except for noise free problems, such a probability law ^^^ j. is always 
different from a Dirac function, which is, as already explained, the natural initial 
condition for the subsequent problem starting at time ti. As a conclusion, the 
sequence of problems is not time consistent as long as we consider feedback laws $t 
depending on Xt only. 



as defined in convex analysis: Y4(x) = < , , 

■' ^^^' \^ 4,00 otherwise 

The initial law fitg in Problem (3) corresponds to the information available on Xtg before 

Xta i^ observed, but it seems more reasonable in a practical situation to use all the available 

information when setting the problem again at each new initial time, and thus to use a Dirac 

function as the initial condition. 
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Remark 2 (Joint probability constraints). Rather than ¥{g{XT) >b)<a, let us 
consider a more general chance constraint of the form: 

F{gt{Xt)>bt,yt = h,...,T)<a. 

This last constraint can be modelled, like the previous one, through an expectation 
constraint by introducing a new binary state variable: 

Yt„ = 1, 
Yt+i ^Ytx l{gt+i(Xt+i)>&t+i}, Vf = to,...,T- 1, 
and considering constraint E (Yt) < a. D 

4.2. Back to time consistency. We now show that time consistency can be 
recovered provided we choose the right state variable on which to base decisions. 
We hence establish a link between time consistency of a family of optimization 
problems and the notion of state variable. 

We claim that a better-suited state variable for the family of problems with 
final time expectation constraint introduced above is the probability law of the 
variable X. Let us denote by Vt{iJ,t) the optimal cost of the problem starting at 
time t with initial condition fit- Using notations of the distributed formulation of 
a SOC problem, one can write a DP equation depending on the probability laws ^ 
on X: 

^T(M) = (f'^^ iU9,M)<«, 
^ ' [^ +00 otherwise, 

and, for every t ~ to, . . . ,T — 1 and every probability law ^ on A": 

Vt (//) = min(AfSAi) + Vt+i ((^f')%^) • 

The context is similar to the one of the deterministic example of §2, and Lemma 3 
states that solving the deterministic infinite-dimensional problem associated with 
the constrained problem leads to time consistency provided DP is used. For the 
problem under consideration, we thus obtain optimal feedback functions $t which 
depend on the probability laws /it. Otherwise stated, the family of constrained 
problems introduced in S4.1 is time consistent provided one looks for strategies 
as feedback functions depending on both the variable Xt and the probability law 
oiXt. 

Naturally, this DP equation is rather conceptual. The resolution of such an 
equation is intractable in practice since probability laws fit are infinite-dimensional 
objects. 

5. Conclusion 

We informally introduced a notion of time consistency of a sequence of decision- 
making problems, which basically requires that plans that are made from the very 
first time remain optimal if one rewrites optimization problems at subsequent time 
steps. We show that, for several classes of optimal control problems, this concept 
is not new and can be directly linked with the notion of state variable, which is the 
minimal information one must use to be able to take the optimal decision. 

We show that, in general, feedback laws have to depend on the probability law 
of the usual state variable for Stochastic Optimal Control problems to be time 
consistent. This is necessary, for example, when the model contains expectation or 
chance constraints. 

Future works will focus on three main directions. The first concern will be to 
better formalize the state notion in the vein of the works by Witsenhausen (1971, 
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1973) and Whittle (1982). The second will be to establish the hnk with the ht- 
erature concerning risk measures, in particular the work by Ruszczynski (2009). 
Finally, the last DP equations we introduced are in general intractable. In a forth- 
coming paper, we will provide a way to get back to a finite-dimensional information 
variable, which makes a resolution by DP tractable. 
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